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Abstract: 

First, three successful environments for artificial evolution in computer systems are analysed 
briefly. The organism in these enviroment are in a virtual machine with special chemistries. 
Two key-features are found to be very robust under mutations: Non-direct addressing and 
separation of instruction and argument. 

In contrast, the x86 instruction set is very brittle under mutations, thus not able to 
achieve evolution directly. However, by making use of a special meta-language, these two 
key-features can be realized in a x86 system. This meta-language and its implementation is 
presented in chapter 2. 

First experiments show very promising behaviour of the population. A statistically analyse 
of these population is done in chapter 3. One key-result has been found by comparison of the 
robustness of x86 instruction set and the meta-language: A statistical analyse of mutation 
densities shows that the meta-language is much more robust under mutations than the x86 
instruction set. 

In the end, some Open Questions are stated which should be addressed in further re- 
searches. An detailed explanation of how to run the experiment is given in the Appendix. 
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1. Overview 



1.1. History 



1.1.1. Core World 



Artificial evolution for self-replicating computer codes has been introduced for the first time 
in 1990, when Steen Rasmussen created CoreWorld. 1^ Core World is a virtual machine which 
can be controlled by a language called RedCode. This assembler-like language has a pool 
of ten different instructions that take two addresses as arguments. Rasmussen's idea was 
to introduce a random flaw of the MOV command, resulting of random mutations of the 
self-replicating codes within the environment. The big disadvantage of RedCode was, that 
nearly all flaws led to a lethal mutation, hence evolution did not occure as wished. 



In 1992, Tom Ray found out that the problem with RedCode was due to argumented 
instruction set: Independent mutations in the instruction and its arguments are unlikely to 
lead to a meaningful combination. [2 Instead of direct connection between the instruction 
and its argument, Ray developed a pattern-based addressing mechanism: He introduced 
two NOP-instructions (NOPO and NOPl). These instructions do not operate themselve, but 
can be used as marker within the code. A pattern-matching algorithmus would find the 
first appearence of a complementary marker string given (after the search-command), and 
returns its addresse. 

PUSH_AX ; push ax 



1.1.2. Tierra 



JMP 



NOPO 



NOPl 



NOPO 



jmp markerlOl 



INCJ\ 



mc ax 



NOPl 



NOPO 



NOPl 



markerlOl : 



POP_CX 



pop cx 



There are 32 instructions available in the virtual Tierra world, roughly based on assembler 
(jmp , PUSH_AX , INC_B and so on) . With these inventions, Ray was able to gain great results 
for artificial evolution (like parasitism, multi-cellularity|3^t4, j ■•■)• 



1.2 Evolutionary Properties of different Chemistries 
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1.1.3. Avida 

In 1994, Christoph Adami has developed another artificial evolution simulation, called Avida. 
Beside of some different structures of the simulation, an important change has been made 
in the artificial chemistry: Instead of hardcoded arguments within the instructions (as in 
Tierra for example PUSH_AX), instructions and arguments are completely separated. The 
arguments are defined by NOPs (in avida there are three NOPs: nop-A, nop-B, nop-C) 
following the operation (for example, a nop-A following a PUSH pushes the AX-register to 
the stack). There are 24 instructions available in avida, again roughtly based on assembler 
(call, return, add, sub, allocate and so on). 

push 

nop-A ; push ax 

jump-f 

nop-A 

nop-B 

nop-B ; jmp markerBCC 

inc 

nop-A ; inc ax 

nop-B 

nop-C 

nop-C ; markerBCC: 

pop 

nop-B ; pop bx 



With that improvements of the virtual simulation, the researchers using avida found out 
amazing results, among other things about the origin of complex features in organism[5]. 

1.2. Evolutionary Properties of different Chemistries 

In 2002, an detailed analyse about different artificial chemistries has been published[S]. The 
authors compare several different instruction sets for evolutionary properties as Fitness and 
Robustness {R = ^ where N is the number of non-lethal mutations and M is the number 
of all possible mutations). 

The Chemistry I consists of 28 operations and has total seperated operations and argu- 
ments (same as Avida). Chemistry II has 84 unique instructions and seperated operations 
and arguments. The last Chemistry III has 27 instructions, but within the instructions 
the argument is given (i.e. push- AX, pop-CX, . . .). As a result, it has been found that 
Chemistry I is much robuster and can achieve a much higher fittness than Chemistry II, 
Chemistry III is the worst language for evolution. 



1.3 Biological Information Storage 



1.3. Biological Information Storage 

The information in natural organism is stored in the DNA. The DNA is roughly speaking a 
string of nucleotides. There arc four nucleotides - Adenine, Cytosine, Guanine and Thymine. 
Three nucleotides form a codon, and are translated by tRNA to amino acids. Amino acids 
are the building blocks of proteins, which are main modules of cells. 

One can calculate that there are N ~ 4'* = 64 possibilities how you can sort the codons - 
one could code 64 different amino acids. But nature just provides 20 different amino acids, 
hence there is big redundancy in the translation process. 
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There is a connection between the frequency of the amino acid in the genom and the 
redundancy of the translation process. This mechanism protects the organism from conse- 
quences of mutation. Imagine an Alanine (ALA) codon GCU will be mutated to GCC, this 
codon will still be translated to Alanine, thus there is no effect. 



2. Artificial Evolution in x86 



2.1. Chemistry for x86 

The aim is to create an evolvable chemistry for a native x86 system. So far, aU noteable 
attempts have been done in virtual simulated plattforms, where the creator can define the 
structure and the embedded instruction set. 

On the other hand, the x86 chemistry has been defined long time ago and appears to be 
not very evolution-friendly. The instruction set is very big, the arguments and operations are 
directly connected, there is no instruction-end marker or constant instruction-size. Hence, 
selfreplicators are very brittle in that environment, and almost all mutations are lethal. 

A possibility to avoid the bad behaviour of the x86 instruction set concerning mutations 
is to create a (at best Turing-complete) meta-language. At execution, the meta-language 
has to be translated to x86 assembler instructions. 

Here, a meta-language is presented with a eight bit code for each instruction, which will 
be translated to x86 code at execution. Obviously, this is the same procedure as in Protein 
biosynthesis. The eight bits coding a single instruction in the meta-language are analogs of 
the three codons representing one amino acid. At execution they are translated to a x86 
instruction - just as tRNA transformes the codon to a amino-acid. A punch of translated 
x86 instructions form a specific functionality, in biology a number of amino acids form a 
protein (which is responsible for a certain functionality in the organism). 

2.2. The instruction set 

One intention was to create an instruction set which can be translated to x86 instructions 
in a very trivial way. This was a noteable restriction as key instructions used in Tierra 
and avida (like search-f , jump-b, dividate, allocate, . . .) can not be written in a 
simple way in x86 assembler. 

The main idea of the meta-language is to have a number of buffers which are used as 
arguments of all operations. Registers are not used directly as arguments for instructions, 
but have to be copied from/to buffers, leading to a seperation of operation and argument. 
The instructions have a very similar form as in avida; comparing nopsA & push vs. push 
& nop-A, or pop & nopdA vs. pop & nop-A or nopsA & inc & nopdA vs. inc & nop-A for 
the meta-language and avida, respectively. 

It has emerged that it is enough to use three registers (RegA, RegB, RegD), two buffers 
for calculations and operations (BCl, BC2) and two buffers for addressing (BAl, BA2). 



2.2 The instruction set 
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Buffer 


instructions 


(16) 








nopsA 


BCl=RegA 




mov 


ebx. 


eax 


nopsB 


BCl=RegB 




mov 


ebx. 


ebp 


nopsD 


BCl=RegD 




mov 


ebx. 


edx 


nopdA 


RegA=BCl 




mov 


eax. 


ebx 


nopdB 


RegB=BCl 




mov 


ebp. 


ebx 


nopdD 


RegD=BCl 




mov 


edx. 


ebx 


saveWrtOf f 


BA1=BC1 




mov 


edi. 


ebx 


save JmpOf f 


BA2=BC1 




mov 


esi. 


ebx 


writeByte 


byte [BAl] = 


{BC1&&0XFF) 


mov 


byte [edi ] , bl 


writeDWord 


dword[BAl] 


=BC1 


mov 


dword[edi], ebx 


save 


BC2=BC1 




mov 


ecx. 


ebx 


addsaved 


BC1+=BC2 




add 


ebx. 


ecx 


subsaved 


BC1-=BC2 




sub 


ebx. 


ecx 


getDO 


BCl=DataOf f set 


mov 


ebx. 


DataOff set 


getdata 


BCl=dword[BCl] 


mov 


ebx. 


dword [ebx] 



getEIP 



BCl=instruction pointer call gEIP; gEIP: pop ebx 



Operations (10+8) 



zerO 


BC1=0 


mov 


ebx. 


0x0 


push 


push BCl 


push ebx 




pop 


pop BCl 


pop 


ebx 




mul 


RegA*=BCl 


mul 


ebx 




div 


RegA/=BCl 


div 


ebx 




shl 


BCl « (BC2&&0XFF) 


shl 


ebx. 


cl 


shr 


BCl » (BC2&&0XFF) 


shr 


ebx. 


cl 


and 


BC1=BC1&&BC2 


and 


ebx. 


ecx 


xor 


BC1=BC1 xor BC2 


xor 


ebx. 


ecx 


addOOOl 


BCl+=Oxl 


add 


ebx. 


0x0001 


add0004 


BCl+=0x4 


add 


ebx. 


0x0004 


addOOlO 


BCl+=OxlO 


add 


ebx. 


0x0010 


add0040 


BCl+=0x40 


add 


ebx. 


0x0040 


addOlOO 


BCl+=OxlOO 


add 


ebx. 


0x0100 


add0400 


BCl+=0x400 


add 


ebx. 


0x0400 



2.2 The instruction set 
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addlOOO 
add4000 
subOOOl 



BCl+=OxlOOO 
BCl+=0x4000 
BC1-=1 



add ebx, 0x1000 
add ebx, 0x4000 
sub ebx, 0x0001 



Jumps (4) 



JnzUp 
JnzDown 
JzDown 
ret 



jz over && jmp esi && over: 

jnz down {&& times 32: nop) && down; 

jz down (&& times 32: nop) && down: 

ret 



API calls (11) - Windows based 



CallAPIGetTickCounter 


stdcall 


[GetTickCount] 


CallAPIGetCommandLine 


stdcall 


[ GetCommandLine ] 


CallAPICopyFile 


stdcall 


[CopyFile] 


CallAPICreateFile 


stdcall 


[CreateFile] 


CallAPIGetFileSize 


stdcall 


[GetFileSize] 


CallAPICreateFileMapping 


stdcall 


[CreateFileMapping] 


CallAPIMapViewOfFile 


stdcall 


[MapViewOfFile] 


CallAPICreateProcess 


stdcall 


[CreateProcess] 


CallAPIUnMapViewOfFile 


stdcall 


[UnMapViewOfFile] 


CallAPICloseHandle 


stdcall 


[CloseHandle] 


CallAPISleep 


stdcall 


[Sleep] 



There are 30+8 unique commands (the eight addNNNN and subOOOl could be reduced to 
one single command, but this would make the code very big) and 11 API calls - giving 
49 instructions. For translation, a command is identified by 8bits. Therefore there are 
N = 2^ = 256 possible combinations, thus there is a big redundancy within the translation 
of commands to x86 code - just as in natural organism. This gives the code a big freedom 
in protecting itself against harmful effects of mutations. 

2.2.1. An example: Linear congruential generator 

The following code creates a new random number (Linear congruential generator) via 

Xn+i = {axn + c) mod m 
with a = 1103515245, c = 12345 and m = 2^2 (these are the numbers used by GCC). 



2.3 Translation of meta-language 
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. data 

DataOf f set : 

SomeData dd 0x0 
RandomNumber dd 0x0 



. code 

macro addnumber arg { . . . } 

; Creates the correct addNNNN combination 



getDO 
add0004 

getdata ; mov ebx, dword [RandomNumber ] 

nopdA ; eax=dword [RandomNumber ] 



zerO 

addnumber 1103515245 ; mov ebx, 1103515245 

mul ; mul ebx 



zerO 

addnumber 12 34 5 

save ; mov ecx, ebx 



nopsA 
addsaved 

nopdB ; mov ebp, (1103515245* [RandomNumber] +12345) 

; ebp=new random number 

getDO 
add0004 

saveWrtOff ; mov edi, RandomNumber 



nopsB 

writeDWord ; mov dword [RandomNumber ] , ebp 

; Save new random number 



.end code 



2.3. Translation of meta-language 

As the instruction set has been created to construct a trivial translator, the translator can 
be written as a single loop. A meta-language instruction is one byte, the corresponding 
x86 instruction has 8 bytes (for 256 instructions, this gives a 8 * 256 = 2.120 Byte long 
translation table). 

The Translator picks one 8bit codon, searchs the corresponding x86 instruction and writes 
that x86 instruction to the memory. At the end of all codons, it executes the memory. 



2.3 Translation of meta-language 
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invoke 

mov 

mov 



VirtualAlloc, 0x0, 0x10000, 
[Place4Life] , eax 
edx, 0x0 



WriteMoreToMemory : 

mov ebx, 0x0 



cmp 
jne 



mov bl, byte [edx+StAmino] 

shl ebx, 3 

mov esi, StartAlphabeth 

add esi, ebx 

mov ebx, edx 

shl ebx, 3 

mov edi, [Place4Life] 

add edi, ebx 

mov ecx, 8 

rep movsb 



inc edx 

edx, (EndAmino-St Amino) 
WriteMoreToMemory 



0x1000, 0x4 

64 KB RAM 

EDX will be used as the 
counter of this loop 

EBX=0 

BL=NUMBER OF AMINO ACID 
EBX*=8; 

Alphabeth offset 

offset of the current amino acid 
current number of amino acid 
lenght of amino acids 

Memory address 
Offset of current memory 
ECX=8 

Write ECX bytes from ESI to EDI 
Write 8 bytes from Alphabeth 
to Memory 
Increase EDX 



call 



[Place4Life] 



; Run organism! 



The Translation Table/ Alphabeth has the following form: 

; 0001 1000 - 24: 
_getEIP EQU 24 
ACommand2 4 : 

call gEIP 
gEIP : 
pop ebx 

ECommand2 4 : 

times (8-ECommand2 4+ACommand2 4 ) : nop 



; 0001 1001 - 25: 
_JnzUp EQU 25 
ACommand25 : 

jz over 
jmp esi 
over : 

ECommand2 5 : 

times (8-ECommand2 5+ACommand2 5) : nop 



3. Experiments 

To achieve evolution it is necessary to have replication, mutation and selection. 

3.1. Overview 

An ancestor organism has been written, which is able replicate itself. It copies itself in the 
current directory to a random named file and execute its offspring. 

The mutation-algorithm is written within the code (not given by the plattform as it 
is possible in Tierra or avida). With a certain probability a random bit within a special 
interval of the new file flips. Each organism can create five offspring, each with a different 
intervall and probability of mutation. 

For finding an adequate mutation probability, one can calculate the probability P that at 
least one bit-flip occures giving a N byte interval and a probability phu for a single bit to 
flip: 

N~l 

P{N,PMt) = J2 Pb^t{l'Pb^t^ = 1 - {l-pMtT 
n=0 
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The second offspring has also the opportunity to change the alphabeth. This could lead 
to redundancy in the alphabeth to avoid negative effects of mutations (as used in nature - 



descriped in 1.3). The mutations in the third offspring can access the whole file. 

Natural selection is not very strong in this experiment, CPU speed and harddisk space 
is limited. Thus, most non-lethal mutations are neutral and disribute randomly within the 
population. This can be used very easy to understand the relationship of the population: 
The smaller their Hamming distance, the nearer their relationship. 

The Hamming distance /S.{x,y) is defined as 

Beside of natural selection, their could be artificial selection. Some artificial selection has 
been used to prevent some wired behaviour of the populations. 



3.2 First attempts 
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The experiments have been done on a native WindowsXP. For stabilization, several small 
C++ guard programs have been developed which searches and closes endless- loops, closes 
error messages and dead processes (processes that live longer than a certain time). 

3.2. First attempts 

The first attempt has shown some unexpected behaviour. 

Multiple instances of same file: Already after a few dozen of generations, the process 
list started to fill with multiple instances of the same file. An analyse of the file shows that 
this happens due to a mutation corrupting the random name engine. The random name 
engine always generates the same filename (for instance aaaaaaaa.exe). After the mutation 
process (which has no effect as the file has write protection due to execution) the new/old 
file is executed again. 

To prevent this unnatural behaviour, it has become necessary to include an artificial 
selection to the system. The new C++ file scans the process list for multiple file instances, 
and closes them. 

It is interesting to see that this is a real selection, not a restriction to the system. Mutation 
still can create such effects, but with that additional guard file, they have negative conse- 
quences for the file (the will be deleted immediatly) and therefore will not spread within the 
population. 

Avoid mutations: The first long-time experiment appeared to be very promising. The 

guard files closed error messages, endless loops, dead processes and multiple instances of the 
same file. After some hundred generations the experiment has been stopped and the files 
have been analysed. Surprisingly, all files had the exact same bit-code, they were all clones. 

There has been a mutation in the alphabeth, changing the xor instruction. This instruc- 
tion is responsible for changing a bit at the mutation-process. If the mutation does not work 
anymore, no files will change anymore. 

For the organism, this is a big advantage. All offspring will survive as no more mutation 
happens. Other organism often create corrupt offspring, hence spread slower. After a while, 
the whole system is dominated by unmutable organism. 

In nature, organism also created very complex systems to prevent mutation or mutational 
effects. DNA repairing or amino acid redundancy are just two examples. 

Even this discoverment is very interesting and has a great analogon in nature, it prevents 
from further discoverments in this artificial system. Therefore another guard file has been 
developed, which scans the running files for clones and deletes them. 

It's not natural to prohibit clones at all, thus a adequate probability should be found. If 
there are 42 clones in the process list, they should be detected by a probability of 51% in 
one guard file cycle. This gives a probability of P = that a running file will be checked 
whether it has clones. A controlled file will be compared to all other running files, all clones 
will be deleted. 



3.3 Statistical analyse of experiment 
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3.3. Statistical analyse of experiment 

Afer installing the new guard file, a further experiment has been run. This first "long-term" 
experiment can be analysed statistically by comparing the density and type of mutation 
from the ancestor file and the latest population. Unforunatlcy it is very hard to determinate 
the number of generations in the population; by comparing mutations in the oldest 
population with the primary ancestor and using the mutation probability, one could 
speculate that there have been 400-600 generations. 

Number of mutations - ancestor vs. successor: A number of 100 successor have been 

randomly picked and compared with the ancestor. One can calucatc the average number of 
mutation during the lifetime of the experiment, and its standard deviation: 



1 " 

X = - V Xj = 192.02 Mutations 

n ^ — ^ 



1 

\ i=i 



The standard deviation gives an unexpected small value, which means that the number of 
mutations is quite constant over the lifetime of the population. 



Relations between individua: One can analyse the relation beween the individua by 
calculating their Hamming distance (the number of differences in their bytecode). A number 
of six files have been selected randomly and analysed. 
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3.3 Statistical analyse of experiment 



14 



While a.exe, b.exe and e.exe are near related, c.exe is far away from all other files, 
d.exe and f .exe are medium related. Interestingly, while c.exe has the biggest distance 
to all other successors, it has the smallest distance to the ancestor. 

Distribution of mutations: It is interesting to see which mutations are rare and which 
are widely spread within the population. There are 153 mutations which appeare in every 
single file, 32 mutations appearing in 84 files and so on. Many mutations are located in 
unused areas of the file, for instance in the Win32 .EXE padding bytes or in the unused 
part of the alphabeth. A list of mutations of the active-code (whether the used part of the 
alphabeth or the meta-language code) and its appearence in the population is given here. 
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1126 : 


100 


139c: 


100 


1148 : 


84 


105c: 


1 


d02: 


100 


118f : 


100 


13b3: 


100 


127d: 


84 


1085: 


1 


d4f : 


100 


1194 : 


100 


13d5: 


100 


12a8: 


84 


lOef : 


1 


d5d: 


100 


lla2 : 


100 


13eb: 


100 


130f : 


84 


12dl : 


1 


d7a: 


100 


lla9: 


100 


13fd: 


100 


1388 : 


84 


12ee: 


1 


d7d: 


100 


llbl : 


100 


1430 : 


100 


1392 : 


84 


1323 : 


1 


e3b: 


100 


124d: 


100 


144c: 


100 


13e4 : 


84 


1353: 


1 


e49: 


100 


1265: 


100 


1459: 


100 


c9d: 


32 


139a: 


1 



A full analyse of these mutations would be worthwhile, but has not be done in this primary 
analyse due to its great effort. However, to understand this system and its prospects better, 
detailed code analyse will be unavoidable. 

Nevertheless, examples of two mutations can be given. 

First one is Byte 0x527: This is within the alphabeth, defining the behaviour for the 
JnzUp instruction. A bit-flip caused following variation: 

jz over jz over 

jmp esi — >■ jmp esi 
over: nop 

over : 
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This has no effect m the behaviour, but effects just the byte-code - a neutral mutation. 

The second example is the mutation in Byte 0xC7B, which is within the meta-language 
code. The unmutated version is the instruction addOOOl, the mutated one represents 
add0004. This is part of the addnumber 26 instruction, which is used as modulo number 
for the random name generator. 

Due to this mutation, the genom not just picks letters from a~ z for its offpring's filename, 
but also the next three in the ASCII list, {, | and }. Thus, filenames can also contain these 
three characters. This mutation has an effect of the behaviour, but still seems to be a neutral 
mutation. 

3.4. Comparing Robustness with x86 instruction set 

In 2005 a program called Gloeobacter violaceus has been developed, that uses artificial 
mutations in the x86 instruction set, without making use of a meta-language. [7j That 
program also replicates in the current directory, and is subjected by point mutations, and 
rarely by inseration, deletion and dublication. Due to the brittleness of the x86 instruction 
set, that attempt was not very fruitful. Still it gives a good possibility of comparison. 

Both systems have changed to the same initial situation: Point mutations occure in 
the whole file with same probability. After several hundreds of generations, all non-minor 
mutations (occure in more than 50 different files) of 2.500 files have been analysed. The 
mutations have been classified by their position: x86-code mutations, mutation in some 
padding region or in the meta-language code. 

Through this classification we find out whether the new meta-language concept is more 
robust than the x86 instruction set. 

We define the mutation density of a specfic region in the code by 



Pmut (Region) 



mutations in region 



size of region 



Meta-Language concept: 



Pm«t (whole code) 



291 



6144 
151 



= 0.047 



Pmut (padding) 



2427 
81 



= 0.062 



/ci™„i(meta-code) 



2084 
14 



= 0.039 



Pm«t (x86) 



576 



= 0.024 



The p„mt(x86) combines the very small translator code and the alphabeth, but as the 
alphabeth is no real x86 code, comparing this advisable. If that problem would be neglected. 
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one would see that the meta-language is more robust under mutations as the x86 code. For 
a fair comparison Gloeobacter violaceus can be used. 
Gloeobacter violaceus: 

351 

Pmut (whole code) = = 0.098 

284 

Pm«t (padding) = = 0.127 
Pr««f(x86) = ^ = 0.015 



Mutations in the padding bytes do not corrupt the organism, thus it is the initial mutation 
density. A comparison between pm„t (padding) and /9m„t (Region) gives the percentage of 
non-lethal mutations in that region, therefore gives the robustness R of that region. 

Robustness (Region) := 

Prrmt(paddmg) 

The interesting comparison is between the x86 region and the meta-language region. 

Robustness(x86) = ^^^f,^^ , = = 0.115 

Pmut (paddmg) 0.127 

Robustness(meta-code) = Pmm{^''^-code) ^ 0039 ^ ^^^^ 
^ ^ p„„t(paddmg) 0.092 

Even this analysis is based on low statistics, it already indicates a great result: 

This new meta-language concept for x86 systems is much more robust than the 
original x86 instruction set. 



4. Outlook 



4.1. Open questions 

Development of new functionality: The most important question is whether an artificial 
organism with this meta-language in a x86 system can develope new functionalities. 

In a long-term evolution experiment by Richard Lenski, they discovered that simple E.coli 
was able to make a major evolutionary step and suddenly acquired the ability to metabolise 
citrate. |H] This happened after 31.500 generations, approximativly after 20 years. 

The generation time of artificial organism are of many orders of magnitude smaller, there- 
fore beneficial mutation such as development of new functionality may occure within days 
or a few weeks. The question that remains is whether this meta-language concept is the 
right environment or not. 

Other types of mutations: Point mutation is one important type of mutation, but not 
the only one. In DNA, there is also Deletion, Duplication, Inseration, Translocation, Inver- 
sion. Especially inseration of code and deletion of code is proved to be important in artificial 
evolution tocfS] The question is how one can create such a type of mutation without file 
structure errors occuring after every single mutation. 

One possibility would be to more the n last byte of the meta-language code forwards 
(deletion) or backwards (inseration), filling the gap with NOPs. However, how could you find 
out where the end of the meta-language code is without some complex (and thus brittle) 
functions? 

Behaviour of Hamming distance: How is the time evolution of the average Hamming 
distance between a population and the primary ancestor? Does it have a constant slope or 
is it rather like a logarithm? How is the behaviour of the Hamming distance when taking 
into account other types of mutations (as descriped above)? Large-scale experiments are 
needed to answer that questions propriatly. 

APIs: This is an operation system specific problem, and can not be solved for any OS at 
once. For Windows, the current system of calling APIs is not very natural. It is a call to a 
specific addresse of a library, needing the right numbers of arguments on the stack and the 
API and library defined in the file structure. Hence, API calls are not (very) evolvable in 
this meta-language, restricting the ability to use new APIs by mutations. 

One possible improvement could be the usage of LoadLibraryA and GetProcAddress, 
which loads the APIs from the kernel autonomous. This technique would not need the APIs 
and libraries saved within the file structure, and could make it possible to discover new 
functionalities. Unfortunately, this requires complex functions, which may be very brittle 
and unflexible. 

Still it needs more thoughts to find an adequate solution to this problem. 
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4.2. Computer malware 

This technique could be used in autonomous spreading computer programs as computer 
viruses or worms. This has been discussed in a very interesting paper by Ihopoulos, Adami 
and Szor in 2008. [10] 

Their main results are: The x86 instruction set does not allow enough neural mutations, 
thus it is impossible to develope new functionalities; a 'evolvable' language or a meta- 
language would be needed. Further, together with smaller generation times, the selective 
pressure and the mutation rate would be higher, speeding up evolution. Conclusion is, that 
it is currently unclear what would be a defence against such viruses. 

In contrast to the experiment explained in chapter [3]- where natural selection was nearly 
absent, computer malware are continuously under selective pressure due to antivirus scan- 
ners. This is the same situation as in biological organism, where parasits are always attacked 
by the immune system and antibiotics. 

Theoretically, computer malware could also find new ways to exploit software or different 
OS APIs for spreading. This is not as unlikely as it seems in the first moment. Experi- 
ments with artificial and natural evolution have shown that complex features could evolve 
in acceptable time. [S] [5] 

4.3. Conclusion 

An artificial 'evolvable' meta-language for x86 Systems has been created using the main 
ideas of Tierra and avida: Separation of operations and arguments, and not using direct 
addressing. The experiments have been very promising, showing that the robustness of the 
new meta-language is approximately four times higher as for usual x86 instructions. Several 
open questions are given in the end, which should motivate for further research. 

Howsoever, the most important step has been done: 

The artificial organism are not trapped in virtual systems anymore, they can 
finally move freely - they took the redpill... 
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A. Appendix 



This artificial evolution system can be started on every common Windows Operation System. 
Even it is a chaotic process, due to the guard files the process can run for hours without a 
breakdown of the system. 



A.l. The package 

The package: 

binary\runOndgens.bat: This script starts all guard files, then starts the 0th generation. 
Adjust the hardcoded path in the file to the directory of the guard files. This file has to be 
at H:, you can use subst for that. 

binary\NewArt . exe : This is the 0th generation. It has to be started with the shell (or a 
.bat file) - not via double-click. It is highly recommented to not run the file without the 
guard files. This file has to be as H: aswell. 

ProcessWatcher\* . * : This directory contains the binary and source of all guard files. 

ProcessWatcher\CopyPopulation.cpp: This file copies every 3 minutes 10% of the 
population to a specific path given in the source. This path has to be adjusted before usage. 

ProcessWatcher\Dead.cpp: This program can be used to manually stop all organism. 
You can enter a probability of how many organism should be survive. For instance, if you 
enter 10, 90% of the population will be terminated - 10% survive. 

ProcessWatcher\DoubleProcess . cpp: This program searchs and destroyes multiple 



instances of the same file. See Chapter 3.2 for more information. 

ProcessWatcher\EndLessLoops . cpp : This guard file searchs for endless loops in the 
memory and terminates them. 



ProcessWatclier\JustMutation. cpp: Also descriped in chapter 3.2 this program searchs 



and terminates clones in the process. 

ProcessWatcher\Kill2MuchProcess . cpp : This guard is very important for stability of 
the operation system while running the experiment. If there are more than 350 processes 
running, it terminates 75% of them. 

ProcessWatcher\RenioveCorpus . cpp : As space is restriced, this guard deletes files that 
are older than a 30scc. 

ProcessWatcher\SearcliAndDestroy . cpp : This program removes error messages (by 
clicking at "OK"), terminates error-processes (as dwwin.exe or drwtsn32.exe), and it 
terminates dead processes (processes that are older than lOOscc). 

ProcessWatcher\malformed_PEn.exe: These are two malformed .EXE files, which will be 
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called by SearchAiidDestroy.exe at the start to find the "OK"-Button. They have to be 
in the directory of the guard files. 

Analyse\SingleFileAnalyse : This directory contains an analyse file, that compaires the 
bytecode of two genotypes. Copy the file to the directory, change the name in the source 
and execute it. 

Aiialyse\Relation: This file compaires gives you the Hamming distance of all .exe files. 

Analyse\MutationDistribution: With this file you can get a distribution of all mutations 
compaired with NewArt.exe. 



A. 2. Running the experiment 

Copy the runOndgens.bat and NewArt.exe to H:\. Adjust the path in 
CopyPopulation. cpp to the backup directory (and compile it) and in runOndgens.bat 
to the directory of the guard files. 

Now you can start runOndgens .bat, move over the two error-messages (dont click them, 
this will be done by SearcliAndDestroy.exe). Then you are ready and can press a key in 
the runOndgens.bat, which will start 10 instances of NewArt.exe. 




Running Experiment: This is how the experiment should look like 



